
CLAIMS 

Data classification apparatus comprising: 
an input device for receiving a plurality of training 
classified \ examples and at least one unclassified 
example; \ 

a memory for storing the classified and unclassified 
examples; \ 

an output terminal for outputting a predicted 
classification for Vhe at least one unclassified example; 
and \ 

a processor for ideraifying the predicted classification of 
the at least one unclassified example 
wherein the processor mciudes: 

classification allocati6^|nea2is for allocating potential 
classifications to each Unclassified example and for 
generating a plurality of classification sets, each 
classification set containing the plurality of training 
classified examples and the\at least one unclassified 
example with its allocated potential classification; 
assay means for determining a \trangeness value valid 
under the iid assumption for each iriassiflcation set; 
a comparative device for selecting thV classification set to 
which the most likely allocated potential classification for 
the at least one unclassified example\belongs, wherein 
the predicted classification output lay the output 
terminal is the most likely allocated^ classification 
according to the strangeness values assigned by the 
assay means; and \ 
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a strength of prediction monitoring device for 
ictcrmining a confidence value for the predicted 
classification on the basis of the strangeness value 
assigned by the assay means to one of the classification 
sets t\ which the second most likely allocated potential 
classification of the at least one unclassified example 
belongs. 
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Data classification apparatus as claimed in claim 1, 
wherein the processor further includes an example 
valuation d evict which determines individual 
strangeness values \br each training classified example 
and the at least onfiv uj 
allocated potential c las si 



classified example having an 
ation. 



Data classification apbar&t\is as claimed in claim 2 r 
wherein Lagrange multipliers >^re used to determine the 
individual strangeness value. 

Data classification apparatus as claimed in claim 2, 
wherein the assay means determines a\strangeness value 
for each classification set in dependence on the 
individual strangeness values of each example. 



25 5. 
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Data classification apparatus comprising: 
an input device for receiving a plurality of\training 
classified examples and at least one unclassified 
example; 

a memory for storing the classified and unclassified 
examples; y 
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stored programs including an example classification 
prdgram; 

an \output terminal for outputting a predicted 
classification for the at least one unclassified example; 
and 

a processor controlled by the stored programs for 
identifying the predicted classification of the at least one 
unclassified example wherein the processor includes; 
classification, allocation means fox allocating potential 
classification to each unclassified example and for 
generating a Vplurality of classification sets, each 
classification sd; containing the plurality of training 



classified exampl 
example with its 
assay means fo 



d the at least one unclassified 
< ated potential classification; 
determining a strangeness value valid 
under the iid assumption for each classification set; 
a comparative device fok selecting the classification set to 
which the most likely allocated potential classification for 
the at least one unclassified example belongs, wherein 
the predicted classification output by the output 
terminal is the most \ikely allocated potential 
classification according to \ the strangeness values 
assigned by the assay means ai^ 
a strength of prediction monitoring device for 
determining a confidence value for the predicted 
classification on the basis of ttte strangeness value 
assigned by the assay means to one\pf the classification 
sets to which the second most likely ^allocated potential 
classification of the at least one unclassified example 
belongs. 
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data classification method comprising: 
inputting a plurality of training classified examples and 
at least one unclassified example; 

identifying a predicted classification of the at least one 
unclassified example which includes, 

allocating potential classifications to each unclassified 
example; 

generating \a plurality of classification sets, each 
classification^ set containing the plurality of training 
classified examples and the at least one unclassified 
example with itsSallocated potential classification; 



determining a s 
assumption for eac 
selecting the class 
allocated potenti 
unclassified exi 
classification is the 
classification in 



eft 
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dependence 



angeness value valid under the iid 
classification set; 
atfion set to which the most likely 
^sification for the at least one 
plongs, wherein the predicted 
st likely allocated potential 
on the strangeness values; 
determining a confidence \ value for the predicted 
classification on the basis >of the strangeness value 
assigned to one of the classification sets to which the 
second most likely allocated potential classification for 
the at least one unclassified example belongs; and 
outputting the predicted classification for the at least 
one unclassified example and the confidence value for 
the predicted classification. 



30 



A data classification method as claimecL in claim 6, 
further including determining individualX strangeness 
values for each training classified example >&nd the at 
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least one unclassified example having an allocated 

potential classification. 

A data classification method as claimed in any one of the 
preceding clairnsl wherein the selected classification set 
is selected withomt the application of any general rules 
determined from the training set. 

A data carrier on which is stored a classification program 
for classifying data by performing the following steps: 
generating a plurality of classification sets> each 
classification set containing a plurality of training 
classified examples andAt least one unclassified example 
that has been allocatedmlpotential classification; 
determining a stra£^|eiMss value valid -under the iid 
assumption for each/classification set; 

selecting the classification, set to which the most likely 
allocated potential classification for the at least one 
unclassified example belongs, wherein the predicted 
classification is the mostt likely allocated potential 
classification in dependence \on the strangeness values; 
and \ 

determining a confidence Value for the predicted 
classification on the basis of the strangeness value 
assigned to one of the classification sets to which the 
second most likely allocated potential classification for 
the at least one unclassified exaira>le belongs. 
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